Homotopy-Based Semi-Supervised Hidden Markov Models for Sequence Labeling
نویسندگان
چکیده
This paper explores the use of the homotopy method for training a semi-supervised Hidden Markov Model (HMM) used for sequence labeling. We provide a novel polynomial-time algorithm to trace the local maximum of the likelihood function for HMMs from full weight on the labeled data to full weight on the unlabeled data. We present an experimental analysis of different techniques for choosing the best balance between labeled and unlabeled data based on the characteristics observed along this path. Furthermore, experimental results on the field segmentation task in information extraction show that the Homotopy-based method significantly outperforms EM-based semisupervised learning, and provides a more accurate alternative to the use of held-out data to pick the best balance for combining labeled and unlabeled data.
منابع مشابه
Semi-Supervised Learning of Sequence Models with Method of Moments
We propose a fast and scalable method for semi-supervised learning of sequence models, based on anchor words and moment matching. Our method can handle hidden Markov models with feature-based log-linear emissions. Unlike other semi-supervised methods, no decoding passes are necessary on the unlabeled data and no graph needs to be constructed— only one pass is necessary to collect moment statist...
متن کاملSemi-Supervised Learning of Sequence Models with the Method of Moments
We propose a fast and scalable method for semi-supervised learning of sequence models, based on anchor words and moment matching. Our method can handle hidden Markov models with feature-based log-linear emissions. Unlike other semi-supervised methods, no decoding passes are necessary on the unlabeled data and no graph needs to be constructed— only one pass is necessary to collect moment statist...
متن کاملSemi - Supervised Learning for Acoustic
Enormous amounts of audio recordings of human speech are essential ingredients for building reliable statistical models for many speech applications, such as automatic speech recognizers and automatic prosody detector. However, most of these speech data are not being utilized because they lack transcriptions. The goal of this thesis is to use untranscribed (unlabeled) data to improve the perfor...
متن کاملImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملSemi-unsupervised Weighted Maximum-Likelihood Estimation of Joint Densities for the Co-training of Adaptive Activation Functions
9:40 Yann Soullard and T. Artieres (University Pierre and Marie Curie, Paris, France) Iterative Refinement of HMM and HCRF for Sequence Classification We propose a strategy for semi-supervised learning of Hidden-state Conditional Random Fields (HCRF) for signal classification. It builds on simple procedures for semi-supervised learning of HMMs and on strategies for learning a HCRF from a traine...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008